Dataset Shape Overview
This bar chart provides a simple overview of the dataset's structure by showing the total number of rows and columns. We started with 80.199rows and 37 columns. After cleaning we are left with 44.174 rows and 14 columns and we added price/m² for a total of 15 columns.
Distribution of All Properties by Surface Area
This histogram shows the distribution of all properties based on their habitable surface in m². It helps identify whether there are common property sizes, and shows any skewness in the data: many small apartments vs few large houses. As you can see: This chart includes some extreme values which skew the visual impression and make the chart alomst unusable.
Distribution of Surface Area, Without Outliers
This version of the previous chart removes the top 1% of properties with the largest surface areas, in the 99th percentile. The goal is to focus on the majority of properties without being distorted by a few very large ones. This provides a clearer view of the typical distribution of property sizes. As we can see from the graph, the majority of properties lie around the 100m² to 200m² point, with the peak being at 100m²
Property Surface Area by Subtype
This interactive Plotly histogram shows the surface distribution from the previous chart, but now split by property subtype. We're still filtering to the 99th percentile to remove extreme values. We overlayed the subtypes, allowing us to compare subtypes within the same space while still seeing overlap. We can now start filtering on subtype so we can see how the distribution among individual subtypes
Price Corrolation Heatmap
In this heatmap we can see the corrolation of all features to eachother, but most important, their corrolation to the price.
All feature correlations with price:
- Price per m² (0.58)
- Bedroom Count (0.36)
- Swimming Pool (0.29)
- Building Condition (0.22)
- Habitable Surface (0.16):
- Slight positive link — larger surface contributes to price, though less so than price per m².
- This seemed very strange and counterintuitive to me, so i decided to investigate further in the following charts.
- Terrace (0.09)
- Lift (0.05)
- Garden (0.05)
All feature correlations with price (FILTERED data):
- habitablesurface: 0.58 - price_per_m²: 0.51 - bedroomcount: 0.39 - hasswimmingpool: 0.27 - building_condition: 0.24 - hasterrace: 0.10 - hasgarden: 0.08 - haslift: 0.01 COMPARISON: Top 5 variables ---------------------------------------- ORIGINAL (unfiltered): 1. price_per_m2: 0.58 2. bedroomcount: 0.36 3. hasswimmingpool: 0.29 4. building_condition: 0.22 5. habitablesurface: 0.16 FILTERED: 1. habitablesurface: 0.58 2. price_per_m²: 0.51 3. bedroomcount: 0.39 4. hasswimmingpool: 0.27 5. building_condition: 0.24
Top 5 Heatmap (FILTERED Data)
Top 5 most important features correlating to price using filtered data (outliers removed): - habitablesurface: 0.58 - price_per_m²: 0.51 - bedroomcount: 0.39 - hasswimmingpool: 0.27 - building_condition: 0.24
Scatterplots of Top 5 Features Corrolating With Price
Here we can see each fature in the top five visualized with a scatterplot 1. habitablesurface vs price: - Strongest visual correlation — larger surfaces lead to higher prices, as expected. 2. price_per_m2 vs price: - Positive correlation, but with high spread — same price per m² can lead to vastly different total prices. 3. bedroomcount vs price: - Slight upward trend, but a lot of variance — more bedrooms don’t guarantee higher price. 4. hasswimmingpool vs price: - Properties with pools tend to be more expensive, but overlap is large. 5. building_condition vs price: - Newer/better condition buildings show slightly higher prices. - Still, lots of price variance within each condition level, effect is present but not dominant.
Scatterplots of Top 5 Features Correlating With Price - HOUSE
Here we can see the same features but filtered to only show houses Using the filter we get a clearer view of the dataset and can infer much more info, more easily and quickly 1. price_per_m2 vs price: - Positive correlation; most prices cluster below €1.5M. - Wide spread in price per m² for lower-priced properties. - High-priced properties tend to have lower price per m². - Slight inverse relationship overall: as total price increases, price per m² decreases. 2. bedroomcount vs price: - Slight trend of higher prices with more bedrooms, up to around 5. - After 5 bedrooms, the trend flattens with little price increase. 3. hasswimmingpool vs price: - Apartments with swimming pools are more commonly found at higher price points. 4. building_condition vs price: - Slight price increase seen with better building conditions. 5. habitablesurface vs price: - Clear positive correlation: larger surface = higher price. - Most properties fall within 0–400 m². - Significant price variability — small properties can still be very expensive.
Scatterplots of Top 5 Features Correlating With Price - APARTMENT
Here we can see the same features but filtered to only show apartments 1. Price per m² vs Price: - Higher price per square meter generally corresponds to higher apartment prices. - Most apartments are clustered at lower price per m² values. - Outliers exist with extremely high price per m², linked to high overall prices. - Price variability increases as price per m² rises. 2. Bedroom Count vs Price: - No clear trend, but apartments with 2–4 bedrooms show wide price variation. 3. Swimming Pool vs Price: - Apartments with swimming pools tend to have significantly higher prices. 4. Building Condition vs Price: - No strong correlation — prices vary across all building condition levels. 5. Habitable Surface vs Price: - Larger habitable surfaces are generally associated with higher prices. - Most apartments fall into the smaller surface range. - A few large-surface apartments are priced notably higher. - Price variability grows with increasing surface area.